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Abstract: The use of model organisms as tools for the 
investigation of human genetic variation has significantly 
and rapidly advanced our understanding of the aetiolo- 
gies underlying hereditary traits. However, while equiva- 
lences In the DNA sequence of two species may be readily 
inferred through evolutionary models, the Identification 
of equivalence In the phenotypic consequences resulting 
from comparable genetic variation Is far from straightfor- 
ward, limiting the value of the modelling paradigm. In this 
review, we provide an overview of the emerging statistical 
and computational approaches to objectively Identify 
phenotypic equivalence between human and model 
organisms with examples from the vertebrate models, 
mouse and zebrafish. Firstly, we discuss enrichment 
approaches, which deem the most frequent phenotype 
among the orthologues of a set of genes associated with 
a common human phenotype as the orthologous 
phenotype, or phenolog. In the model species. Secondly, 
we Introduce and discuss computational reasoning 
approaches to identify phenotypic equivalences made 
possible through the development of intra- and interspe- 
cies ontologies. Finally, we consider the particular 
challenges involved in modelling neuropsychiatric disor- 
ders, which Illustrate many of the remaining difficulties In 
developing comprehensive and unequivocal interspecies 
phenotype mappings. 



Introduction 

Given a candidate gene mutation thouglit to underlie a human 
phenotype, a question commonly asked by human geneticists 
investigating this candidacy is, "Does a mutation in this gene have 
a comparable effect in another species?" To answer this, animal 
models have either been made or identified that possess a genetic 
aetiology relevant to a human disorder. These models have proved 
themselves incredibly useful by (i) allowing repeated observations 
of pathologies germane to often-rare, human genetic disorders 
within an environmentally and genetically controlled background; 
(ii) enabling observations of early stages of a disorder that are often 
presymptomatic in humans; (iri) offering access to tissues not 
normally available from human patients; and (iv) providing a 
platform for therapeutic development and testing. 

For many decades, the study in a model organism of the 
equivalent gene, or orthologue, of a gene associated with human 
phenotypic traits has delivered enormous gains in understanding 
[1]. Animal models carrying null mutations, or knock-outs, in the 
orthologues of human Mendelian disease genes have rapidly 
advanced our understanding of this particular class of genetic 
disorders, while directed mutagenesis techniques have similarly 
advanced our understanding of penetrant gain-of-function 



mutations. The ready-made, often-systematic availability of 
animals carrying a wide range of determined disruptions has 
enabled more resources to be focused on the analysis of the model 
rather than its generation, and projects such as the International 
Mouse Phenotyping Consortium are promising to revolutionise 
our understanding of the molecular basis of human disease by 
providing systematic and standardised analyses of the phenotypic 
relevance of nearly all mouse genes [2-8] . 

With the availability of ever more phenotype data from model 
organisms, the issue of what computational and algorithmic 
resources will be required to make optimal use of the data is 
becoming progressively more pressing. In this review, we wiU 
discuss how phenotypes can be mapped between humans and 
model species and provide a selective overview of successful 
approaches to cross-species phenotype mapping. Finally, we wiU 
focus on the area of neurobehavioral phenotypes, which is perhaps 
the most difficult of all classes of phenotypes to map between 
species and is representative of the challenges that remain for 
comprehensive cross-species mapping. 

What Is a Phenotype? 

In biology, a widely accepted definition of phenotype is, "The 
observable traits of an organism." In medical contexts, however, 
the word "phenotype" is more often used to refer to some 
deviation from normal morphology, physiology, or behaviour, and 
this is the definition that we will use here. Thus, physicians 
characterise the phenotype of their patients (although they rarely 
speak of it in this way) by taking a medical history or by means of a 
physical examination, diagnostic imaging, blood tests, psycholog- 
ical testing, and so on, in order to make the diagnosis [9]. 
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In some contexts, the word "phenotype" is commonly used to 
refer to a disease entity. However, it is important to distinguish 
between diseases and phenotypic features. A disease usually has 
multiple phenotypic features; e.g., the disease "common cold" can 
have the features "sneezing," "runny nose," "fatigue," and 
"fc\Tr." On the other hand, a feature can occur with multiple 
diseases. For instance, "fever" occurs not only with the common 
cold, but also with hyperthyroidism, leukaemia, rheumatoid 
arthritis, and many other infectious and non-infectious diseases. 
Thus, there is a complex, many-to-many relationship between 
diseases and phenotypic features, which likely reflects the 
underlying pleiotropy of biochemical pathways and cellular 
networks. 

From Gene to Phenotype 

Perhaps the most obvious starting point for mapping pheno- 
types between species is to investigate animal models with a 
mutation in a gene that is orthologous to a human gene associated 
with a disease (Figure 1 A). Geneticists invoke evolution to bestow a 
degree of universality to the function of a gene, inferring that 
similarity in the encoded protein sequences implies similarity in 
function, and that function is most likely to be conserved between 
unique, 1:1 orthologous genes [10,11]. However, the expectation 
that an equivalent mutation in an orthologous pair of genes wiU 
yield the same phenotype in two different species fails to 
acknowledge the diflFerences that define distinct species. A 
phenot^'pe is an often compk'x and emergent property of a 
bit)logical system that is usually influencf;d directly and indirectly 
by many genes. Even for highly penetrant mutations in close and 
weU-conserved orthologues, significant differences in outcomes 
have been observed; neither the disruption of iffiftT(Lesch-Nyhan 
syndrome) nor mutations in DMD (Duchenne's muscular dystro- 
phy) give strong pheno types in the mouse [12,13]. Phenotypic 
differences may be observed more frequendy when comparing 
systems, for example, immunity, that are rapidly evolving and/ or 
subject to large environmental influence, the latter obviously not 
well modelled through a laboratory upbringing [14]. However, to 
dwell on these differences would be to deliberately ignore the 
many more examples of animal models that have yielded 
considerable insight into human genetic disease. For instance, at 
present, 3,829 mouse models associated to human diseases are 
Usted in the Mouse Genome Database [15] (http://www. 
informatics.jax.org/ vocab/omim). 

Despite obvious species differences, phenotypic equivalences can 
be objectively discovered. The orthologues of genes that function 
together in a particular molecular pathway often also function 
together in the orthologous pathway in another species even when 
separated by a considerable evolutionary distance [16]. As 
disruptions to different genes that operate within the same pathway 
often produce similar phenotypes [17], disruptions of the ortholo- 
gues of genes that yield a given phenotype in human can plausibly 
be predicted to yield the equivalent phenotype in a model organism 
if they are disrupting the orthologous pathway (Figure IB). 

Marcotte and colleagues systematically demonstrated this 
equivalence by forming groups of human genes that shared a 
human phenotype and then asking whether there was an unusually 
common phenotype amongst any one group's orthologues in 
another species [18]. They termed these evolutionary phenotypic 
associations between groups of orthologous genes "phenologs." 
The thousands of phenologs discoverable through this approach 
included over 150 identified between human and yeast, a 
di\'ergen(:e of over 1.5 billion years. Marcotte and colleagues 
demonstrated that this objective approach could identify 
non-obvious phenologs that were of significant predictive value. 



showing, for example, that genes associated with lovastin 
sensitivity in yeast, the phenolog of abnormal angiogenesis in 
mice, were indeed involved in vasculature formation in Xenopus. In 
a similar approach, Webber and colleagues were able to 
objectively map phenotypes between human and mouse by 
examining the genes affected by mutations in individuals with 
neurodevelopmental phenotypes [19,20]. However, while the 
phenolog associations revealed by these approaches are often 
relevant, they may not be the most specific. For example, while 
individuals with psychosis harbour mutations that are enriched in 
the orthologues of genes associated with the phenotype abnormal 
prepulse inhibition in the mouse, an abnormal prepulse inhibition is 
not synonymous with psychosis in humans [21]. The many-to- 
many relationship between genes and phenotypes makes the 
process of reliably mapping human phenotypes through phenologs 
vulnerable to pleiotropic effects and genetic interactions [22]. 

Perhaps the greatest difficulty in comprehensively mapping 
phenotypes lies in the necessary assumption that genes whose 
function is not associated with a particular phenotype have been 
examined and found not to influence that phenotype: this 
assumption is prolificaUy untrue, with only a fraction of possible 
phenotypes examined for only a minority of non-randomly 
selected gcnc's, particularly in species such as the mouse and 
zebrafish that are less amenable to large-scale screening. To 
address this, it would be particularly valuable to identify putative 
phenologs in organisms amenable to high-throughput screening 
and thereby obtain systematic coverage (see Box 1). However, the 
ability to identify equivalent phenotypes between different species 
allows one to use genotype-phenotype associations discovered in 
one species to infer unexamined associations in another. 

Phenotype Ontologies 

Other approaches to computational cross-species phenotype 
analysis do not begin with the identification of orthologous genes 
but rather directly estimate the similarity between phenotypic 
abnormalities seen in human disease and animal models. 
Ontologies have become an indispensable tool to measure cross- 
species phenotypic similarity. An ontology is a representation of 
knowledge that uses a controlled vocabulary to enable knowledge 
sharing and computer reasoning. "Ontology" was famously 
defined as a specification of a conceptualization [23], meaning 
that an ontology provides a representation of the concepts of a 
domain of knowledge (conceptualization) together with the 
semantic relations between them (specification). Ontologies can 
be used to represent items of a domain of knowledge, for example, 
the Chemical Entities of Biological Interest (GhEBI) ontology 
provides a comprehensive representation of biologically relevant 
small molecules [24], but also to represent the attributes of domain 
concepts. Perhaps the most well-known ontology of this type is the 
Gene Ontology [25], which describes the functions, roles, and 
locations of gene products. Similarly, phenotype ontologies 
describe the phenotypic abnormalities associated with diseases or 
found in individual patients or model organisms. In this review, we 
win concentrate on the use of the Human Phenotype Ontology 
[26,27] (HPO) to describe human genetic disease, and the 
Mammalian Phenotype Ontology (MPO) [28,29] to describe 
genetically modified mouse models [30]. Each ontology consists of 
thousands of terms, each of which represents a single phenotypic 
abnormality such as "atrial septal defect." The terms in the HPO 
and MPO are related to one another by subclass ("is a") relations, 
such that the ontology can be represented as a so-called "directed 
acyclic graph." This structure enables annotation propagation 
whereby more specific phenotypic terms are also described by 
more general parent terms and, thus, all ancestral terms. For 
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Figure 1. Interspecies phenotype mapping strategies. This review highlights three major methodologies to identify phenotypes in the mouse 
that are relevant to a human disease. (A) Classical approach. A mouse model is made or identified that possesses a genotype equivalent to a 
penetrant mutation that in human underlies the disease of interest (termed construct validity). The mouse model is examined for phenotypes that 
resemble those that define the human disorder (face validity). (B) Phenolog mapping. A group is formed containing candidate genes for a disease of 
interest. The respective mouse models for the orthologues of these genes are then examined for any unusually overrepresented phenotypes among 
them and these phenotypes (termed phenologs) are deemed relevant to the disease. (C) Direct phenotype mapping. Given the phenotype(s) that 
describe a human disease, the corresponding phenotypes in mouse are inferred by means of computational reasoning using interspecies phenotype 
ontology analysis. In the example shown, the HPO term Aortic stenosis is defined on the basis of the PATO term constricted and aortic valve (term from 
the Foundational Model of Anatomy ontology of human anatomy [35]). Similarly, the MPO term aortic valve stenosis is defined using the same PATO 
term constricted and aortic valve (term from the IVlouse Anatomy ontology [77]). Since both the IVlouse Anatomy and FMA terms for aortic valve are 
children of the cross-species anatomy ontology (Uberon [40]) term for aortic valve, automatic reasoning places the HPO term Aortic stenosis and the 
IVlPO term aortic valve stenosis in the direct vicinity of one another in a cross-species phenotype ontology [42]. Therefore, these terms display a high 
semantic similarity to one another. 
doi:1 0.1 371 /journal.pgen.1 004268.g001 



instance, if a patient lias an ahnormality of the cerebellum, he or she can 
also be said to have an ahnormality of the hindbrain, a term that is an 
ancestor of abnormality of the cerebellum (Figure 2). 

The phenotypic terms themselves do not describe any specific 
disease but may be used to list the phenotypic features that 
characterise a particular disease. For instance, to assert that 
patients with neurofibromatosis type I have Lisch nodules of the 
iris, we annotate the disease neurofibromatosis type I with the 
corresponding HPO term, "Lisch Nodules." Mouse models that 
display a given phenotypic abnormality are annotated to IVlPO 
terms in an analogous fashion. The network of diseases, associated 



phenotypic features, and genes can now be used for a number of 
purposes, including differential diagnostics, prioritization of 
candidate genes, and research into the relationships between 
genotype and phenotype (Figures 1 and 3). 

For some applications, it is sufficient to assert that a given 
genetically modified mouse is a good model of some human 
disease. For instance, mouse genotypes that have been used to 
study specific human diseases are curated by the Mouse Genome 
Informatics (MGI) group using disease terms found in the Online 
Mendehan Inheritance in Man (OMIM) [31]. While this is useful 
to find models for a specific disease, when starting with a set of 
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Box 1. Comparing Phenotypes with More Distant Species. 



This review has concentrated on interspecies ontology 
analysis of mouse, the model organism with the highest 
number of genes orthologous to human and the highest 
number of explicit models for distinct human diseases. 
However, while mouse models of disease often appear to 
most resemble their human counterparts, other model 
organisms offer important advantages for studying specific 
areas of physiology and disease-related biology. For exam- 
ple, the zebrafish is particularly amenable to understanding 
early development, due to the externally developing, 
transparent embryos and ease of molecular perturbation. 
Large-scale screens of the fruit fly Drosophila melanogaster, 
the zebrafish Danio rerio, and the nematode worm 
Caenorhabditis elegans mutants have been performed for 
several decades, and a large and diverse amount of 
phenotypic information has been collected. These data, 
while inclined towards the specific beneficial features of each 
model system, are complementary to one another and to 
mouse in their scope. Furthermore, unlike with mouse, they 
tend not to be as biased towards the investigation of a 
specific disease, as is often the case for mouse studies. The 
nematode worm C. elegans is used as a model to study 
cellular differentiation and basic biological processes, with 
the developmental fate of each of its up-tol,031 somatic 
cells having been mapped. The relative ease of genetic 
manipulation in C elegans by techniques such as RNA 
interference [80] has enabled large-scale and largely unbi- 
ased investigations of the phenotypic consequences of 
alterations of gene function, and over 420,000 Worm 
Phenotype Ontology (WPO) annotations are available from 
the Wormbase [81]. Similarly, the fruit fly Drosophila is one of 
the most widely used model organisms in genetics since 
Thomas Hunt Morgan's discovery of chromosomes as the 
carriers of genes in D. melanogaster. Currently, over 358,000 
phenotype annotations are available in the model organism 
database for Drosophila genetics, Flybase [82]. 
To support annotation and analysis of these models, many 
model organism consortia are developing phenotype 



ontologies that are amenable to the kind of cross-species 
semantic analysis described in this review, because of their 
use of modular definitions that make use of existing 
ontologies from the Open Biological Ontology (OBO) 
Foundry initiative [41] as building blocks. Depending on 
the species in question, building block ontologies repre- 
senting anatomy, pathology, gene function, embryology, 
biochemistry, and others are used to provide computational 
definitions of phenotypic abnormalities. Interoperable phe- 
notype ontologies and annotations are thus now available 
for human [26], mouse [29], zebrafish [75], nematode worm 
C. elegans [81,83], fruit fly Drosophila [84], rat [85], and fission 
yeast [86]. However, one of the issues is that the nature of 
the genotype-phenotype annotations in each of these 
sources differs. In one source, the phenotypes are linked to 
an allele or gene, whereas in another they might be linked to 
a full genotype. Furthermore, since the ontologies were 
constructed independently and according to different 
principles or focus, relating them requires some sophisticat- 
ed ontological engineering techniques [40]. Interoperation of 
these ontologies and the genotype-phenotype annotations 
is a primary goal of the Monarch Initiative (monarchinitiati- 
ve.org), which provides integrated data and phenotype 
comparison analysis resources that are available to the 
community. Inclusion of these diverse phenotype data bring 
the phenotype coverage up to approximately 80% of human 
genes based on orthology, which may be beneficial for the 
identification of rare and undiagnosed genetic disease 
causes. Additionally, other efforts such as the Phenotype 
Ontology Research Coordination Network (RCN) [87] are 
aiming to develop standards and best practices for accurate 
phenotype representations across a range of plants, verte- 
brates, and arthropods for evolutionary biology. In the 
future, it will be important to improve computational 
methodologies for phenotypic analysis over a large range 
of species to make best use of the advantages that each 
model organism has to offer. 



observed phenotypes the relations described in the HPO or MPO 
allow us to identify all diseases (or models) characterised by those 
phenotypic features. Similarly, by identifying equivalent pheno- 
types between the HPO and the MPO, a search would be able to 
return both relevant mouse models and relevant human diseases. 
In addition to phenolog mapping described above, another 
approach to creating phenotypic equivalences might be to 
manually assign phenotype terms from one ontology to the other. 
For instance, one could assert that the MPO term hypoglycemia 
(MP:0000189) is equivalent to the HPO term Hypoglycemia 
(HP:0001943). While this particular mapping seems perfectly 
reasonable, it is not possible to map every individual term in one 
ontology to the equivalent term in the other ontology; many 
individual phenotypic features do not have a clear match in the 
other species, and the way a phenotype is observed and recorded 
in mice is often quite distinct from phenotypic analysis performed 
in the course of a medical examination. For instance, there is no 
obvious match in humans for the MPO term abnomal tail movements 
(MP:0001391), and there is no obvious match in mice for the HPO 
term Expressive language delay (HP: 0002474). Another important 
issue is that phenotypes elicited in the course of scientific 
experiments on mouse models are not equivalent to medical 
phenotypes. For instance, it is not uncommon to subject mouse 
hearts to ischemia and reperfusion to induce cardiac damage, and 



then to compare the hearts of mice with a certain genetic defect to 
those of wild-type mice. If the mutant mice exhibit larger 
infarctions than the wild-type mice, the MPO term increased 
myocardial infarction size (MP:0Q03037) is used to annotate them. 
Obviously, there is no corresponding HPO term, and in fact it is 
not even entirely clear what the relationship of increased myocardial 
infarction size to the HPO term Myocardial infarction should be. 

For this reason, a different strategy was chosen to develop 
semantic mappings between the HPO and the MPO. A crucial 
part of this strategy is the use of logical definitions to enable 
sophisticated semantic reasoning over ontology terms. Logical 
definitions of phenotype terms use building block ontologies to 
represent the various anatomical, cellular, physiological, and 
metabolic abnormalities, combining them into ontology classes 
using semantic constructs of the ontology language OWL. The 
Phenotype, Attribute, and Trait Ontology (PATO) is a key tool in 
this effort because it provides an abstract representation of the 
abnormal qualities encountered in the phenotypic abnormalities 
(Figwe IC) [32-34]. PATO consists of a single hierarchy of 
qualities designed to be used in conjunction with other ontologies 
representing entities that are the bearers of abnormal phenotypic 
qualities, including the Foundational Model of Anatomy (FMA) 
ontology [35], the Gene Ontology (GO) [36], and the cell ontology 
[37]. Many phenotype terms can be defined using the Entity/ 
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Figure 2. Phenotype ontologies. Phenotype ontologies (an excerpt 
from the Human Phenotype Ontology is shown here) consist of 
thousands of terms describing phenotypes arranged in a hierarchical 
system of subclasses and superclasses. The structure of an ontology 
enables annotation propagation whereby more specific phenotypic 
terms are also described by more general parent terms, and thus all 
ancestral terms. The terms are related to one another by subclass ("is 
a") relations, such that the ontology can be represented as a so-called 
directed acyclic graph. The terms themselves do not describe any 
specific disease. Instead, annotations to terms are used to state that a 
certain disease is characterised by a certain phenotypic feature. 
doi:10.1371/journal.pgen.1004268.g002 

Quality paradigm. In the following example, we consider an HPO 
term that describes increased widtli of the big toe. 

Class: HP:00 10055 

Annotations: label "Broad hallux" 

EquivalentClassOf 

has_part some: 

increased width (PATO_0000600) and 
inheres_m some Big toe FMA:25047 

That is, the HPO class "Broad hallux" is defined as being 
equivalent to a phenotype of increased width that inheres in (is located in) 
a Big toe. Many phenotype terms require more complicated definitions 



that include references to multiple domain ontologies. For instance, 
the following definition of the HPO term Hyperahninemia uses 
references both to the FMA term for blood and to ChEBI for alanine. 

Class: HP: 0003348 

Amiotations: label "Hyperalaninemia" 

EquivalentClassOf: 

has_part some: 

increased concentration (PATO:0001 162) and 
inlieres_in Portion of blood (FMA:9670) and 
towards alanine (CHEBI: 16449) 

These definitions enable interoperability of the HPO with the 
other ontologies in the sense that it becomes possible to search for 
all phenotype terms that involve entities from one of the domain 
ontologies, comprising not only anatomy and small molecules as 
shown above, but also gene function [36], cell types [37], proteins 
[38], pathology [39], and others. Also, thanks to Uberon, an 
integrated cross-species ontology, it is possible to map anatomical 
terms across species [40]. Thus, by using logical definitions for 
mouse and human phenotypes that have been developed using 
interoperable ontologies from the Open Biomedical Ontology 
(OBO) Foundry [41], a common computational basis is created 
that in turn makes it possible to identify equivalent or similar terms 
between phenotype ontologies for different species by using 
automatic reasoning [42-44]. 

A number of different approaches to interspecies phenotype 
mapping have been applied by several groups, and we will refer to 
the original publications for algorithmic details [32,43,45^7]. 
However, for the most part the algorithms make use of logical 
definitions as shown above to identify equivalencies or similarities 
between terms of phenotype ontologies for two or more species 
(Figure IC). Each animal disease model or human disease is then 
annotated to one or more ontology terms. For instance, the human 
disease Marfan syndrome is annotated to a number of HPO terms 
including Tall stature. (HP:0000098), Kyphoscoliosis (HP:0002751), 
Ectopia lentis (HP:0001083), and Aortic root dilatation (HP:0002616). 
The mgR mouse model of Marfan syndrome [48] is annotated to a 
number of MPO terms including increased length of long bones 
(MP:0004695), which is similar to the HPO term Tall stature (which 
in Marfan syndrome results from overgrowth of the long bones); 
kyphosis (MP:0000160), which is similar to the HPO term 
Kyphoscoliosis; and aortic aruurysm (MP:0006278), which is similar 
to the HPO term Aortic root dilatation (an aneurysm is a protruding 
sac formed by the dilation of the wall of the aorta, whereas the 
term Aortic root dilatation refers to an increase in the diameter of the 
proximal section of the aorta). The mgR model does not display an 
ocular phenotype, so there is no obvious match for the HPO term 
Ectopia lentis (a dislocation of the lens of the eye). To calculate a 
phenotypic similarity between the human disease and the mouse 
model, the algorithms mentioned above search over each of the 
terms of the human disease and look for the best match amongst 
the terms used to annotate the mouse model and vice versa. The 
sum of the similarities, which are usually expressed using their 
information content, is then used as a measure of the similarity 
between the diseases. The information content is calculated based 
on the frequency with which a given ontology term is used to 
annotate diseases in a database and thereby provides a way of 
weighting the matches based on the specificity of the phenotypic 
features: the less specific a phenotype is, the lower the information 
content. Many algorithms have been presented to calculate this 
kind of semantic similarity with ontologies, and the field represents 
an area of active research in bioinformatics [49] . Another recent 
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Figure 3. Predicting human genotype-pKienotype relations 
from functional genomics data. The mouse phenotypes associated 
with the orthologues of human genes are a better predictor of genes 
that share human phenotypes than other popular gene annotations of 
the same genes, such as GO or KEGG. As both GO and KEGG include 
information derived from multiple sources, including annotations from 
the mouse, the success of the mouse phenotypes is likely due both to 
the genetic relevance of the mouse models and the fact that human 
and mouse phenotypic annotations both describe abnormalities (see 
Figure 1C). Resnik's [78] measure, together with the GraSM approach 
[79], was used to calculate the similarity of terms organised in these 
hierarchical ontologies, defining the semantic similarity between any 
two terms as the average information content of their disjunct 
common-ancestor terms. Gene pairs were ordered by their semantic 
similarity scores based on either the human KEGG pathway annotations 
(pink circles), human GO biological process (grey circles), or MPO 
annotations to genes (blue circles). For each of KEGG, GO, and MPO 
annotations, gene pairs were ordered in decreasing annotation 
similarity and grouped into bins of 2,000, and then the median 
semantic similarity score between gene pairs' Human Phenotype 
Ontology annotations was calculated. The dashed line marks the 
degree of similarity expected from pairs of random genes. 
doi:1 0.1 371 /journal.pgen.1 004268.g003 

approach to the use of ontologies for diflFerential diagnostics in 
human medicine did not rely on semantic similarity algorithms but 
rather embedded the HPO and the diseases annotated to terms of 
the HPO into a Bayesian network, thereby providing a principled 
framework to deal with noise in phenotypic data and demonstrating 
a substantially improved performance on simulated data [50]. 
Although the field of semantic phenotype matching is still in its 
infancy, even now mouse data are demonstrably better at 
identifying genes that influence the same human phenotypes than 
other commonly used gene annotations such as Gene Ontology or 
Kyoto Encyclopaedia of Genes and Genomes (KEGG) (Figure 3). 
Robinson and colleagues have also recently shown that cross-species 
phenotype matching is a powerful method for the prioritization of 
candidate genes in whole-exome sequencing studies [5 1] . 

Cross-Species Analysis of Behavioural 
Phenotypes and Elucidating the Genetic 
Architecture of Psychiatric Disease 

Behavioural disorders, notably psychiatric disorders, present a 
particularly difficult challenge to both phenotype ontologies and 



cross-species analysis. For example, determining the presence in a 
mouse model of any of the new Diagnostic and Statistical Manual 
of Mental Disorders, Fifth Edition (DSM V) positive symptoms 
required in the diagnosis of schizophrenia ("hallucinations," 
"delusions," or "disorganised speech") is clearly problematic 
[52]. Furthermore, current psychiatric diagnostic classifications 
similarly label patients presenting with a broad spectrum of 
phenotypes, and heterogeneous presentations likely result from 
heterogeneous aetiologies. However, given a well-characterised 
and large cohort of patients harbouring likely highly penetrant 
mutations, relevant mouse model phenotypes can stiU be 
objectively discovered: considering genes affected by de mm 
copy-number variations in 186 individuals with autism, Webber 
and colleagues were able to associate over 40 phenologs which 
were well correlated to the phenotypes already observed in existing 
mouse models of autism- associated genes [53]. However, while the 
association identified between autism and the mouse phenotype 
stereotypic behaviour is readily comparable to the autistic phenotype 
of repetitive behaviours and interests, there were no clear and specific 
associations to the impaired social interaction and verbal and non- 
verbal communication deficits that also define autism [54]. 

Results of genome-wide association studies (GWAS) suggest that 
current approaches to the diagnosis and classification of psychi- 
atric diseases are inadequate. For instance, GWAS findings have 
challenged the traditional distinction between schizophrenia and 
bipolar disorder by identifying genes such as CACNAIC that 
harbour risk alleles for both disorders [55]. Such fmdings did not, 
perhaps, come as a complete surprise given the fact that relatives 
of probands with either disorder have increased risks of both 
schizophrenia and bipolar disorder [56], as well as the well-known 
clinical overlap between the two: patients with bipolar disorder 
can have episodes of psychosis during either manic or depressed 
phases. In fact, more recent findings show that specific single 
nucleotide polymorphisms can be associated with a range of 
psychiatric disorders of childhood or adult onset [57]. 

We join with others to suggest that it may be beneficial to take a 
new approach to the analysis of neurobehavioural disorders that 
will focus on the individual components of the disorder rather than 
just the final diagnostic category [58]. This new approach has two 
potential benefits. While the main clinical purpose of a diagnostic 
category is to allow therapeutic and prognostic decision making, it 
is arguable that the most useful clinical categories, phenotypic 
features, dimensional definitions, and measures for psychiatric 
disease are stiU unknown [55]. Therefore, the act of reducing 
probably heterogeneous groups of patients to a single clinical 
category such as bipolar disorder is likely to reduce the power of 
GWAS or sequencing stucUes to elucidate the molecular pathology 
of psychiatric disease. Studies based on richer representations of 
the phenotype may, in contrast, allow new hypotheses to be tested, 
such as that a certain genetic variant is a risk factor for psychosis, 
rather than schizophrenia or bipolar disorder per se [59]. 

The second potential benefit of this approach for neurobehav- 
ioral clinical research is an improved ability to make use of animal 
models to understand psychiatric disease by allowing more 
accurate interspecies phenotypic comparisons on the basis of 
individual phenotypic aspects of a disorder rather than on complex 
emergent phenomena associated with a disorder. For instance, 
glucocorticoids influence neuronal function in the brain, and are 
thought to be involved in the onset of depression when levels are 
abnormally high [60]. However, it is still unclear how glucocor- 
ticoid signalling is linked to affective disorders. A zebrafish mutant 
with a mutation in the glucocorticoid receptor was shown to 
become immobile ("freeze") and to show reduced exploratory 
behaviour when placed into an unfamiliar aquarium ("novel 
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tank"), abnormalities that could be reversed by tlie addition of the 
antidepressant fluoxetine to the holding water [61]. While it 
appears quite reasonable to infer that this zebrafish is modelling 
some aspect of depressive psychopathology, it is presumably not a 
faithful model of any specific human disorder, such as major 
depressive disorder, with symptoms such as feelings of excessive or 
inappropriate guilt or suicidal ideation. 

The above considerations fit well with the so-called endophe- 
notype concept in psychiatric genetics. An endophenotype in 
psychiatry refers to an internal process that can be objectively 
measured. An endophenotype is an individual feature that may be 
a component of a psychiatric disease. Psychiatric endopheno types 
are defined as being heritable features that tend to manifest in 
individuals with psychiatric diseases whether or not the disease 
itself is active, and that not only cosegregate in families together 
with the disease but also tend to be found in unaffected relatives of 
an individual with a psychiatric disease at a higher rate than in the 
general population [62]. One main reason why endophenotypes 
have attracted attention is the assumption that if an endopheno- 
type represents a more or less atomic component of a complex 
disease entity, then the number of genes required to produce 
variations in these traits may be fewer than those involved in 
producing a psychiatric diagnostic entity, making it easier to 
identify genetic factors for endophenotypes than for disease entities 
[63]. Although a meta-analysis published in 2007, i.e., before the 
publication of large-scale psychiatric GWAS, failed to show an 
advantage for the analysis of endophenotypes in the identification 
of risk alleles for schizophrenia [62], more recent results have 
identified loci significandy associated with various endophenot^'pes 
in schizophrenia [64,6.'^]. However, it should not necessarily be 
assumed that endophenotypes themselves have a simpler genetic 
architecture than psychiatric illnesses. Additionally, what appears 
to be an equivalent endophenotype in human and mouse may 
actually reflect a different pathophysiology. For instance, deficits in 
mouse spatial working memory have recently been reported to be 
based in the hippocampus, rjuestioning the face validity of this 
phenotype for deficits in ^vorking memory associated with scz/bpd 
in humans, the latter based in the frontal cortex [66] . That said, in 
many cases geneticcdly cdtered mice do seem to provide valid 
models for aspects of human psychiatric diseases. For instance, 
schizophrenic patients report oversensitivity to sensory stimulation 
that possibly could be related to the cogniti\'c fragmentation seen 
in this disorder. Experiments with cortical event-related potentials 
and the prepulse inhibition of startle responses have shown that 
schizophrenic patients also have impaired central nervous system 
inhibition (sensorimotor gating) [67]. Correspondingly, neuregulin 
1 [NRGl) is a schizophrenia susceptibility gene in humans, and 
mice lacking any one of the several isoforms of jVrgi display deficits 
in sensorimotor gating [68], among other abnormalities that 
resemble some of the features of human schizophrenia. Similarly, 
the SNAP25 gene has been hnked to schizophrenia in association 
studies [69], and mouse models with abnormalities in SNAP25 
have been shown to have abnormalities in rest and activity 
rhythms, reminiscent of the disturbed sleep patterns observed in 
schizophrenia [70]. Therefore, if appropriate caution is exercised 
in the interpretation of results, a case can be made that it may be 
simpler to investigate the genetic correlates of psychiatric 
endophenotypes in mouse models. Indeed, analogous touch-screen 
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Conclusions 

The human genome project was compared by Victor McKusick 
to the anatomical atlas of Vcsalius published in 1543, in that both 
works provided for the first time a comprehensive list of parts that 
the human body (or genome) contains, but did not actually explain 
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in line. Computational analysis of the phenotype will play a critical 
role in these efforts. In this review, we have highlighted a number 
of computational resources and algorithms that have been 
developed to address current challenges in the field, but it seems 
fair to say that the field of computational phenotype analysis is still 
in its infancy. Nonetheless, computational interspecies phenotype 
analysis will play a crucial role to make full use of the data 
emerging from large-scale projects, such as the International 
Mouse Pheno typing Consortium and the Zebrafish Mutation 
Project, that stand to translate the genomic atlas into functional 
and medical discoveries that will improve our ability to treat 
human disease. 
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